{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# How to improve results with prompting\n",
        "\n",
        "In this guide we’ll go over prompting strategies to improve graph database query generation. We’ll largely focus on methods for getting relevant database-specific information in your prompt.\n",
        "\n",
        "```{=mdx}\n",
        ":::warning\n",
        "\n",
        "The `GraphCypherQAChain` used in this guide will execute Cypher statements against the provided database.\n",
        "For production, make sure that the database connection uses credentials that are narrowly-scoped to only include necessary permissions.\n",
        "\n",
        "Failure to do so may result in data corruption or loss, since the calling code\n",
        "may attempt commands that would result in deletion, mutation of data\n",
        "if appropriately prompted or reading sensitive data if such data is present in the database.\n",
        "\n",
        ":::\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup\n",
        "#### Install dependencies\n",
        "\n",
        "```{=mdx}\n",
        "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n",
        "import Npm2Yarn from \"@theme/Npm2Yarn\";\n",
        "\n",
        "<IntegrationInstallTooltip></IntegrationInstallTooltip>\n",
        "\n",
        "<Npm2Yarn>\n",
        "  langchain @langchain/community @langchain/openai @langchain/core neo4j-driver\n",
        "</Npm2Yarn>\n",
        "```\n",
        "\n",
        "#### Set environment variables\n",
        "\n",
        "We'll use OpenAI in this example:\n",
        "\n",
        "```env\n",
        "OPENAI_API_KEY=your-api-key\n",
        "\n",
        "# Optional, use LangSmith for best-in-class observability\n",
        "LANGSMITH_API_KEY=your-api-key\n",
        "LANGSMITH_TRACING=true\n",
        "\n",
        "# Reduce tracing latency if you are not in a serverless environment\n",
        "# LANGCHAIN_CALLBACKS_BACKGROUND=true\n",
        "```\n",
        "\n",
        "Next, we need to define Neo4j credentials.\n",
        "Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database.\n",
        "\n",
        "```env\n",
        "NEO4J_URI=\"bolt://localhost:7687\"\n",
        "NEO4J_USERNAME=\"neo4j\"\n",
        "NEO4J_PASSWORD=\"password\"\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [],
      "source": [
        "const url = process.env.NEO4J_URI;\n",
        "const username = process.env.NEO4J_USER;\n",
        "const password = process.env.NEO4J_PASSWORD;"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Schema refreshed successfully.\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "[]"
            ]
          },
          "execution_count": 2,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import \"neo4j-driver\";\n",
        "import { Neo4jGraph } from \"@langchain/community/graphs/neo4j_graph\";\n",
        "\n",
        "const graph = await Neo4jGraph.initialize({ url, username, password });\n",
        "\n",
        "// Import movie information\n",
        "const moviesQuery = `LOAD CSV WITH HEADERS FROM \n",
        "'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
        "AS row\n",
        "MERGE (m:Movie {id:row.movieId})\n",
        "SET m.released = date(row.released),\n",
        "    m.title = row.title,\n",
        "    m.imdbRating = toFloat(row.imdbRating)\n",
        "FOREACH (director in split(row.director, '|') | \n",
        "    MERGE (p:Person {name:trim(director)})\n",
        "    MERGE (p)-[:DIRECTED]->(m))\n",
        "FOREACH (actor in split(row.actors, '|') | \n",
        "    MERGE (p:Person {name:trim(actor)})\n",
        "    MERGE (p)-[:ACTED_IN]->(m))\n",
        "FOREACH (genre in split(row.genres, '|') | \n",
        "    MERGE (g:Genre {name:trim(genre)})\n",
        "    MERGE (m)-[:IN_GENRE]->(g))`\n",
        "\n",
        "await graph.query(moviesQuery);"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Filtering graph schema\n",
        "\n",
        "At times, you may need to focus on a specific subset of the graph schema while generating Cypher statements.\n",
        "Let's say we are dealing with the following graph schema:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Node properties are the following:\n",
            "Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING}, Person {name: STRING}, Genre {name: STRING}, Chunk {embedding: LIST, id: STRING, text: STRING}\n",
            "Relationship properties are the following:\n",
            "\n",
            "The relationships are the following:\n",
            "(:Movie)-[:IN_GENRE]->(:Genre), (:Person)-[:DIRECTED]->(:Movie), (:Person)-[:ACTED_IN]->(:Movie)\n"
          ]
        }
      ],
      "source": [
        "await graph.refreshSchema()\n",
        "console.log(graph.getSchema())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Few-shot examples\n",
        "\n",
        "Including examples of natural language questions being converted to valid Cypher queries against our database in the prompt will often improve model performance, especially for complex queries.\n",
        "\n",
        "Let's say we have the following examples:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [],
      "source": [
        "const examples = [\n",
        "    {\n",
        "        \"question\": \"How many artists are there?\",\n",
        "        \"query\": \"MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\",\n",
        "    },\n",
        "    {\n",
        "        \"question\": \"Which actors played in the movie Casino?\",\n",
        "        \"query\": \"MATCH (m:Movie {{title: 'Casino'}})<-[:ACTED_IN]-(a) RETURN a.name\",\n",
        "    },\n",
        "    {\n",
        "        \"question\": \"How many movies has Tom Hanks acted in?\",\n",
        "        \"query\": \"MATCH (a:Person {{name: 'Tom Hanks'}})-[:ACTED_IN]->(m:Movie) RETURN count(m)\",\n",
        "    },\n",
        "    {\n",
        "        \"question\": \"List all the genres of the movie Schindler's List\",\n",
        "        \"query\": \"MATCH (m:Movie {{title: 'Schindler\\\\'s List'}})-[:IN_GENRE]->(g:Genre) RETURN g.name\",\n",
        "    },\n",
        "    {\n",
        "        \"question\": \"Which actors have worked in movies from both the comedy and action genres?\",\n",
        "        \"query\": \"MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\",\n",
        "    },\n",
        "    {\n",
        "        \"question\": \"Which directors have made movies with at least three different actors named 'John'?\",\n",
        "        \"query\": \"MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name\",\n",
        "    },\n",
        "    {\n",
        "        \"question\": \"Identify movies where directors also played a role in the film.\",\n",
        "        \"query\": \"MATCH (p:Person)-[:DIRECTED]->(m:Movie), (p)-[:ACTED_IN]->(m) RETURN m.title, p.name\",\n",
        "    },\n",
        "    {\n",
        "        \"question\": \"Find the actor with the highest number of movies in the database.\",\n",
        "        \"query\": \"MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1\",\n",
        "    },\n",
        "]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can create a few-shot prompt with them like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [],
      "source": [
        "import { FewShotPromptTemplate, PromptTemplate } from \"@langchain/core/prompts\";\n",
        "\n",
        "const examplePrompt = PromptTemplate.fromTemplate(\n",
        "    \"User input: {question}\\nCypher query: {query}\"\n",
        ")\n",
        "const prompt = new FewShotPromptTemplate({\n",
        "    examples: examples.slice(0, 5),\n",
        "    examplePrompt,\n",
        "    prefix: \"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\\n\\nHere is the schema information\\n{schema}.\\n\\nBelow are a number of examples of questions and their corresponding Cypher queries.\",\n",
        "    suffix: \"User input: {question}\\nCypher query: \",\n",
        "    inputVariables: [\"question\", \"schema\"],\n",
        "})"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\n",
            "\n",
            "Here is the schema information\n",
            "foo.\n",
            "\n",
            "Below are a number of examples of questions and their corresponding Cypher queries.\n",
            "\n",
            "User input: How many artists are there?\n",
            "Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\n",
            "\n",
            "User input: Which actors played in the movie Casino?\n",
            "Cypher query: MATCH (m:Movie {title: 'Casino'})<-[:ACTED_IN]-(a) RETURN a.name\n",
            "\n",
            "User input: How many movies has Tom Hanks acted in?\n",
            "Cypher query: MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)\n",
            "\n",
            "User input: List all the genres of the movie Schindler's List\n",
            "Cypher query: MATCH (m:Movie {title: 'Schindler\\'s List'})-[:IN_GENRE]->(g:Genre) RETURN g.name\n",
            "\n",
            "User input: Which actors have worked in movies from both the comedy and action genres?\n",
            "Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\n",
            "\n",
            "User input: How many artists are there?\n",
            "Cypher query: \n"
          ]
        }
      ],
      "source": [
        "console.log(await prompt.format({ question: \"How many artists are there?\", schema: \"foo\" }))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Dynamic few-shot examples\n",
        "\n",
        "If we have enough examples, we may want to only include the most relevant ones in the prompt, either because they don't fit in the model's context window or because the long tail of examples distracts the model. And specifically, given any input we want to include the examples most relevant to that input.\n",
        "\n",
        "We can do just this using an ExampleSelector. In this case we'll use a [SemanticSimilarityExampleSelector](https://api.js.langchain.com/classes/langchain_core.example_selectors.SemanticSimilarityExampleSelector.html), which will store the examples in the vector database of our choosing. At runtime it will perform a similarity search between the input and our examples, and return the most semantically similar ones: "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {},
      "outputs": [],
      "source": [
        "import { OpenAIEmbeddings } from \"@langchain/openai\";\n",
        "import { SemanticSimilarityExampleSelector } from \"@langchain/core/example_selectors\";\n",
        "import { Neo4jVectorStore } from \"@langchain/community/vectorstores/neo4j_vector\";\n",
        "\n",
        "const exampleSelector = await SemanticSimilarityExampleSelector.fromExamples(\n",
        "    examples,\n",
        "    new OpenAIEmbeddings(),\n",
        "    Neo4jVectorStore,\n",
        "    {\n",
        "        k: 5,\n",
        "        inputKeys: [\"question\"],\n",
        "        preDeleteCollection: true,\n",
        "        url,\n",
        "        username,\n",
        "        password\n",
        "    }\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "[\n",
              "  {\n",
              "    query: \u001b[32m\"MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\"\u001b[39m,\n",
              "    question: \u001b[32m\"How many artists are there?\"\u001b[39m\n",
              "  },\n",
              "  {\n",
              "    query: \u001b[32m\"MATCH (a:Person {{name: 'Tom Hanks'}})-[:ACTED_IN]->(m:Movie) RETURN count(m)\"\u001b[39m,\n",
              "    question: \u001b[32m\"How many movies has Tom Hanks acted in?\"\u001b[39m\n",
              "  },\n",
              "  {\n",
              "    query: \u001b[32m\"MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE\"\u001b[39m... 84 more characters,\n",
              "    question: \u001b[32m\"Which actors have worked in movies from both the comedy and action genres?\"\u001b[39m\n",
              "  },\n",
              "  {\n",
              "    query: \u001b[32m\"MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH\"\u001b[39m... 71 more characters,\n",
              "    question: \u001b[32m\"Which directors have made movies with at least three different actors named 'John'?\"\u001b[39m\n",
              "  },\n",
              "  {\n",
              "    query: \u001b[32m\"MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DES\"\u001b[39m... 9 more characters,\n",
              "    question: \u001b[32m\"Find the actor with the highest number of movies in the database.\"\u001b[39m\n",
              "  }\n",
              "]"
            ]
          },
          "execution_count": 8,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "await exampleSelector.selectExamples({ question: \"how many artists are there?\" })"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To use it, we can pass the ExampleSelector directly in to our FewShotPromptTemplate:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {},
      "outputs": [],
      "source": [
        "const promptWithExampleSelector = new FewShotPromptTemplate({\n",
        "  exampleSelector,\n",
        "  examplePrompt,\n",
        "  prefix: \"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\\n\\nHere is the schema information\\n{schema}.\\n\\nBelow are a number of examples of questions and their corresponding Cypher queries.\",\n",
        "  suffix: \"User input: {question}\\nCypher query: \",\n",
        "  inputVariables: [\"question\", \"schema\"],\n",
        "})"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\n",
            "\n",
            "Here is the schema information\n",
            "foo.\n",
            "\n",
            "Below are a number of examples of questions and their corresponding Cypher queries.\n",
            "\n",
            "User input: How many artists are there?\n",
            "Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\n",
            "\n",
            "User input: How many movies has Tom Hanks acted in?\n",
            "Cypher query: MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)\n",
            "\n",
            "User input: Which actors have worked in movies from both the comedy and action genres?\n",
            "Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\n",
            "\n",
            "User input: Which directors have made movies with at least three different actors named 'John'?\n",
            "Cypher query: MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name\n",
            "\n",
            "User input: Find the actor with the highest number of movies in the database.\n",
            "Cypher query: MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1\n",
            "\n",
            "User input: how many artists are there?\n",
            "Cypher query: \n"
          ]
        }
      ],
      "source": [
        "console.log(await promptWithExampleSelector.format({ question: \"how many artists are there?\", schema: \"foo\" }))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {},
      "outputs": [],
      "source": [
        "import { ChatOpenAI } from \"@langchain/openai\";\n",
        "import { GraphCypherQAChain } from \"langchain/chains/graph_qa/cypher\";\n",
        "\n",
        "const llm = new ChatOpenAI({\n",
        "  model: \"gpt-3.5-turbo\",\n",
        "  temperature: 0,\n",
        "});\n",
        "const chain = GraphCypherQAChain.fromLLM(\n",
        "  {\n",
        "    graph,\n",
        "    llm,\n",
        "    cypherPrompt: promptWithExampleSelector,\n",
        "  }\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{ result: \u001b[32m\"There are 967 actors in the graph.\"\u001b[39m }"
            ]
          },
          "execution_count": 19,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "await chain.invoke({\n",
        "  query: \"How many actors are in the graph?\"\n",
        "})"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Deno",
      "language": "typescript",
      "name": "deno"
    },
    "language_info": {
      "file_extension": ".ts",
      "mimetype": "text/x.typescript",
      "name": "typescript",
      "nb_converter": "script",
      "pygments_lexer": "typescript",
      "version": "5.3.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}
